13 research outputs found

    A Review of Codebook Models in Patch-Based Visual Object Recognition

    No full text
    The codebook model-based approach, while ignoring any structural aspect in vision, nonetheless provides state-of-the-art performances on current datasets. The key role of a visual codebook is to provide a way to map the low-level features into a fixed-length vector in histogram space to which standard classifiers can be directly applied. The discriminative power of such a visual codebook determines the quality of the codebook model, whereas the size of the codebook controls the complexity of the model. Thus, the construction of a codebook is an important step which is usually done by cluster analysis. However, clustering is a process that retains regions of high density in a distribution and it follows that the resulting codebook need not have discriminant properties. This is also recognised as a computational bottleneck of such systems. In our recent work, we proposed a resource-allocating codebook, to constructing a discriminant codebook in a one-pass design procedure that slightly outperforms more traditional approaches at drastically reduced computing times. In this review we survey several approaches that have been proposed over the last decade with their use of feature detectors, descriptors, codebook construction schemes, choice of classifiers in recognising objects, and datasets that were used in evaluating the proposed methods

    Designing a resource-allocating codebook for patch-based visual object recognition

    No full text
    The state-of-the-art approach in visual object recognition is the use of local information extracted at several points or image patches from an image. Local information at specific points can deal with object shape variability and partial occlusions. The underlying idea is that, in different images, the statistical distribution of the patches is different, which can be effectively exploited for recognition. In such a patch-based object recognition system, the key role of a visual codebook is to provide a way to map the low-level features into a fixed-length vector in histogram space to which standard classifiers can be directly applied. The discriminative power of a visual codebook determines the quality of the codebook model, whereas the size of the codebook controls the complexity of the model. Thus, the construction of a codebook plays a central role that affects the model’s complexity. The construction of a codebook is an important step which is usually done by cluster analysis. However, clustering is a process that retains regions of high density in a distribution and it follows that the resulting codebook need not have discriminant properties. This is also recognised as a computational bottleneck of such systems.This thesis demonstrates a novel approach, that we call resource-allocating codebook (RAC), to constructing a discriminant codebook in a one-pass design procedure inspired by the resource-allocation network family of algorithms. The RAC approach slightly outperforms more traditional approaches due to its tendency to spread out the cluster centres over a broader range of the feature space thereby including rare low-level features in the codebook than density-preserving clustering-based codebooks. Our algorithm achieves this performance at drastically reduced computing times, because apart from an initial scan through a small subset to determine length scales, each data item is processed only once.We illustrate some properties of our method and compare it to a closely related approach known as the mean-shift clustering technique. A pruning strategy has been employed to tackle a few outliers when assigning each feature in images to the closest codeword to create a histogram representation for each image. Features whose distance from the closest codeword exceeds an empirical distance maximum are neglected. A recognition system that learns incrementally with training images and the output classifier accounting for class-specific discriminant features is also presented. Furthermore, we address an approach which, instead of clustering, adaptively constructs a codebook by computingFisher scores between the classes of interest. This thesis also demonstrates a novel sequential hierarchical clustering technique that initially builds a hierarchical tree from a small subset of the data, while the remaining data are processed sequentially and the tree adapted constructively. Evaluations performed with this approach show that the performance is comparable while reducing the computational needs. Finally, during the process of classification, we demonstrate a new learning architecture for multi-class classification tasks using support vector machines. This technique is faster in testing compared to directed acyclic graph (DAG) SVMs, while maintaining comparable performance to the standard multi-class classification techniques.<br/

    An Efficient BoF Representation for Object Classification

    No full text
    The Bag-of-features (BoF) approach has proved to yield better performance in a patch-based object classification system owing to its simplicity. However, often the very large number of patch-based descriptors (such as scale-invariant feature transform and speeded up robust features, extracted from images to create a BoF vector) leads to huge computational cost and an increased storage requirement. This paper demonstrates a two-staged approach to creating a discriminative and compact BoF representation for object classification. As a preprocessing stage to the codebook construction, ambiguous patch-based descriptors are eliminated using an entropy-based and one-pass feature selection approach, to retain high-quality descriptors. As a post-processing stage to the codebook construction, a subset of codewords which is not activated enough in images are eliminated from the initially constructed codebook based on statistical measures. Finally, each patch-based descriptor of an image is assigned to the closest codeword to create a histogram representation. One-versus-all support vector machine is applied to classify the histogram representation. The proposed methods are evaluated on benchmark image datasets. Testing results show that the proposed methods enables the codebook to be more discriminative and compact in moderate sized visual object classification tasks

    Resource-Allocating Codebook for patch-based face recognition

    No full text
    In this paper we propose a novel approach to constructing a discriminant visual codebook in a simple and extremely fast way as a one-pass, that we call Resource-Allocating Codebook (RAC), inspired by the Resource Allocating Network (RAN) algorithms developed in the artificial neural networks literature. Unlike density preserving clustering, this approach retains data spread out more widely in the input space, thereby including rare low level features in the codebook. We show that the codebook constructed by the RAC technique outperforms the codebook constructed by K-means clustering in recognition performance and computation on two standard face databases, namely the AT&amp;T and Yale faces, performed with SIFT features

    A Multi-staged Feature-Attentive Network for Fashion Clothing Classification and Attribute Prediction

    Get PDF
    In the visual fashion clothing analysis, many researchers are attracted with the success of deep learning concepts. In this work, we introduce a multi-staged feature-attentive network to attain clothing category classification and attribute prediction. The proposed network in this work brings out a landmark-independent structure, whereas the existing landmark-dependent structures take up a lot of manpower for landmark annotation and also suffers from inter- and intra-individual variability. Our focus on this work is intensifying feature extraction by incorporating low-level and high-level feature fusion within fashion network. We are aiming on multi-level contextual features which utilise spatial and channel-wise information to create contextual feature supervision. Further, we enclose a semi-supervised learning approach to escalate fashion clothes analysis that utilises knowledge sharing among labelled and unlabelled data. To the best of our knowledge, this is the first attempt to investigate the semi-supervised learning in fashion clothing analysis by adopting multitask architecture which simultaneously study the clothing categories as well as its attributes. We evaluated the proposed approach on large-scale DeepFashion-C dataset while unlabelled dataset obtained from six publicly available fashion datasets. Experimental results show that the proposed architectures for supervised and semi-supervised learning entailing deep convolutional neural network outperforms many state-of-the-art techniques considerably, in fashion clothing analysis

    ELT teachers working in underprivileged districts of Turkey and their perspective of continuous professional development opportunities

    No full text
    The purpose of this research is to enquire the ELT teachers’ perspective of continuous professional development under restricted conditions they are obliged to teach English. Therefore, 20 EFL teachers who have been working at different state schools in the remote villages of southeast region of Turkey were selected as the participants of the inquiry. The data were collected in two different processes. First, all the teachers in the study were interviewed to answer the questions in the first part of the survey individually. Then, randomly selected 12 teachers were separated into two groups for online focus-group discussions, which were held in a semi-structured context by leading questions adapted from Brown (2013). Data collection processes were audio-recorded, and the qualitative data obtained from interviews and focus group discussions were coded and clustered to form specific themes. The results of the research revealed that teachers working in underprivileged districts of Turkey: (a) believe the crucial contribution of professional development activities for teacher quality and/or student achievement (b) have very limited opportunities and options for sustained professional development, and (c) think that the times allocated for professional development activities should be increased. Moreover, teachers these teachers (d) hope to take place in the planning phase of professional development activities, and (e) are dissatisfied with unfair financial situation. Upon the analysis of these items stated in the results, male teachers were found to focus on more general ideas compared to female teachers (item a and item b). On the other hand, the results indicated that female teachers focused on specific ideas statistically more often compared to male teachers (item c and item d). The teachers did not show statistical gender difference in terms of financial limitations (item e). The results of the research might be generalized to the similar contexts in other underprivileged districts, and similar researches might be conducted with teachers of other fields in order to reach results that are more generalizable.</p

    Transformers in Single Object Tracking: An Experimental Survey

    Full text link
    Single-object tracking is a well-known and challenging research topic in computer vision. Over the last two decades, numerous researchers have proposed various algorithms to solve this problem and achieved promising results. Recently, Transformer-based tracking approaches have ushered in a new era in single-object tracking by introducing new perspectives and achieving superior tracking robustness. In this paper, we conduct an in-depth literature analysis of Transformer tracking approaches by categorizing them into CNN-Transformer based trackers, Two-stream Two-stage fully-Transformer based trackers, and One-stream One-stage fully-Transformer based trackers. In addition, we conduct experimental evaluations to assess their tracking robustness and computational efficiency using publicly available benchmark datasets. Furthermore, we measure their performances on different tracking scenarios to identify their strengths and weaknesses in particular situations. Our survey provides insights into the underlying principles of Transformer tracking approaches, the challenges they encounter, and the future directions they may take.Comment: 36 pages, 22 figures, review paper, submitted to IEEE Access, updated with CVPR-2023 paper

    Transformers in Single Object Tracking: An Experimental Survey

    No full text
    Single-object tracking is a well-known and challenging research topic in computer vision. Over the last two decades, numerous researchers have proposed various algorithms to solve this problem and achieved promising results. Recently, Transformer-based tracking approaches have ushered in a new era in single-object tracking by introducing new perspectives and achieving superior tracking robustness. In this paper, we conduct an in-depth literature analysis of Transformer tracking approaches by categorizing them into CNN-Transformer based trackers, Two-stream Two-stage fully-Transformer based trackers, and One-stream One-stage fully-Transformer based trackers. In addition, we conduct experimental evaluations to assess their tracking robustness and computational efficiency using publicly available benchmark datasets. Furthermore, we measure their performances on different tracking scenarios to identify their strengths and weaknesses in particular situations. Our survey provides insights into the underlying principles of Transformer tracking approaches, the challenges they encounter, and the future directions they may take

    Target-specific Siamese attention network for real-time object tracking

    No full text
    Deep similarity trackers are able to track above real-time speed. However, their accuracy is considerably lower than deep classification based trackers since they avoid valuable online cues. To feed the target-specific information for real-time object tracking, we propose a novel Siamese attention network. Different types of attention mechanisms are used to capture different contexts of target information and then learned knowledge is used to feed target cues at different representation levels of similarity tracking. In addition, an online learning mechanism is employed to utilise the available target-specific data. The proposed tracker reduces the impact of noise in the target template and improves the accuracy of similarity tracking by feeding target cues into the similarity search. Extensive evaluation performed on OTB-2013/50/100 and VOT2018 benchmark datasets demonstrate the proposed tracker outperforms state-of-the-art approaches while maintaining real-time tracking speed
    corecore